Draft- not for distribution
This analysis examines the neighborhood characteristics and housing market conditions in Cuyahoga County’s 1161 block groups between 2017 and 2019. We began by gathering parcel- and address-level data from several sources. Property records obtained from the Cuyahoga County Fiscal Officer are primary data source; we also incorporated information from HUD, the Cuyahoga Land Bank, and other sources.
At first blush, it might seem to make the most sense to examine housing markets for a single year. We investigated this possibility, and found that many block groups, even those composed primarily of residential properties, there was simply too little sales activity to obtain stable and meaningful estimates of current market conditions. In any given year there were many block groups with between zero and five arms length property transfers.
By combining information from three consecutive years, we are able to get a more full and accurate sense of market conditions in Cuyahoga County.
After aggregating property records to the block group level, we dropped from the analysis any block groups with fewer than five arms-length transfers from 2017-19.
We then identified the selection of variables shown in Table 1 which, together, offer a well-rounded view of a neighborhood and it’s market conditions.
| Variable | Definition |
|---|---|
| mn_alpricesft_adj | Mean arms-length sales price per square foot (inflation adjusted) |
| pct_w_sales | Percent of residential properties that sold at least once from 2017-19 |
| var_alpricesft | Variation in sales price per square foot (standard deviation in price / mean price) |
| pct_foreclosed | Percent of residential properties that went through foreclosure process |
| pct_pv | Percent of residential properties with a postal vacancy for at least one quarter |
| res_density | Residential Density: residential properties/square mile |
| pct_own_occ | Percent of single-family, multi-family, and condo parcels claiming the owner-occupant tax credit |
| med_yrbuilt | Median year built of residential properties |
| med_sqft_per_unit | Median square feet per housing unit (i.e., for a double, total square footage / 2) |
| mn_bath_perunit | Average number of baths per housing unit |
| mn_beds_perunit | Average number of beds per housing unit |
| pct_sf | Percent of residential properties that are single family homes |
| pct_mf | Percent of residential properties that are multi-family homes (2-3 housing units) |
| pct_good | Percent of residential properties assessed as in ‘Good’ condition or better by the county auditor |
| ct_demos | Count of residential demolitions from 2017-19 |
| comm_density | Commercial density: commercial properties per square mile |
| hcv_density | Count of housing units that accepted a housing choice voucher per square mile |
The variables selected are related to one another in different, and often in multiple ways. Some of the relationships are obvious– e.g. homes with lots of bedrooms tend to have lots of bathrooms. In other cases, the connections are less straightforward, or might be driven by the interrelationships of several variables together.
For example, pockets of infill development in areas like Ohio City/Tremont might be described by relatively new homes (med_yrbuild), high commercial and residential density (comm_density; res_density) and low rates of homeownership (pct_own_occ). At the same time, newer homes might also be an important feature of the county’s outlying suburbs (think Strongsville, North Royalton); however, in this case, med_yrbuilt would be associated with low levels of commercial/residential density and high homeownership rates. Principal component analysis (PCA) is a method for teasing out these sorts of underlying relationships, and distilling them down into a much smaller number of combined variables (often referred to as components or factors).
We applied the variables in Table 1 to a PCA, and identified four combined measures that describe different underlying neighborhood dimensions. Together, these four dimensions hold approximately 80% of the explanatory power of the original 17 variables.
| Variable | Component 1 | Component 2 | Component 3 | Component 4 |
|---|---|---|---|---|
| mn_alpricesft_adj | -0.74 | |||
| pct_w_sales | 0.82 | |||
| var_alpricesft | 0.85 | |||
| pct_foreclosed | 0.9 | |||
| pct_pv | 0.85 | |||
| res_density | 0.46 | 0.64 | ||
| pct_own_occ | -0.76 | |||
| med_yrbuilt | -0.78 | -0.35 | ||
| med_sqft_per_unit | 0.95 | |||
| mn_bath_perunit | 0.76 | |||
| mn_beds_perunit | 0.96 | |||
| pct_sf | -0.44 | 0.72 | ||
| pct_mf | 0.74 | -0.36 | ||
| pct_good | 0.61 | -0.36 | ||
| ct_demos | 0.9 | |||
| comm_density | 0.43 | -0.33 | ||
| hcv_density | 0.65 |
The next step was to use these four neighborhood dimensions to categorize block groups into different and meaningful neighborhood ‘types.’
This analysis identified 9 neighborhood types.
The clustering algorithm provides several pieces of information that provide insight into how it arrived at a particular solution. One thing it offers us is an ‘uncertainty score’ for each observation (block group). This is a number ranging from 0 to 1, with scores closer to 0 reflecting a high degree of confidence that a given block group was placed into the correct bucket, and scores close to 1 indicating that a block group shared features associated with 2 or more different clusters, and didn’t fit in neatly anywhere.
In this map, for each neighborhood type I selected the 40 strongest examples of that type- i.e., the block groups with the 40 lowest uncertainty scores. uncertainty